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ABSTRACT 

Methods for calculating the statistical significance 
of excess events and the interpretation of the 
formally derived values are discussed. It is argued 
that a simple formula for a conservative estimate 
should generally be used in order to provide a common 
understanding of quoted values. 

1. Introduction . Substantial nonuniformity exists in the cosmic ray 
literature with respect to how the statistical significance of features or 
excess events is being calculated (e.g. point sources, spectral lines, 
light curves). Consequently, there is no mutual understanding about what 
the confidence in some result might really be when a number of 'standard 
deviations' are being quoted. Some of the proposed procedures for 
calculation need to be taken with caution. On the other hand, there is a 
clear need for the adoption of a standard method to allow the reliable 
intercomparison of quoted results and create a common understanding of the 
associated confidence. 

A number of methods and formulae have been proposed together with 
sometimes extended mathematical derivation or justification (Ref. 1-4), 
It has become clear however, that some of these methods needtob«taken with 
caution. On the other hand there is a very simple formula which is being 
widely used by X-ray astronomers providing a common understanding. 

2. Statistical Significance . An example for the statistical situation 
which we like to discuss is given in Figure 1. 


Numbers of events x^are plotted versus bin 
number i = l....n, corresponding to 
intervals of some physical variable (e.g. 
energy, phase, electric charge, time, 
...). In the example given there seem to 
be 'excess events' in bins 1 and 2 as 
compared to the 'background' defined by 
the other bihs. The excess is 
ON -*0FF, 

when ON and OFF are the integrated counts 
in channels 1 to 2 and in channels 3 to n, 
respectively and e< is the ratio of the 
corresponding number of bins, here 
2/ ( n-2) . 



Fig. 1. Statistical example. 
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T h e g e n 0 r a l q u e s t :i. o n s t: h e n a r 0 s 

1. Does the excess correspond to the presence of a physical 
signal? 

2 W h at i s t h e ' s i g n i f i e a n c e o f t h e s i g n a 1 ' ? 

It is important to distinguish between these two questions,. 
T h e y c: o r r e s p o n d t o t h e a s s u m p t i o n 1 1") a t o n e o u t o f t w o 
a 1 1 e r n a t i v e h y p a t h e s i s i s t r u. e s 

t h e n u 1 1 h y p o t h e s i s H 0 i. s , t h a t t h e r e r e a 1 1 y is o n 1 y 

background , 

t h e hyp o t h e s i s H 1 i s , t h a t a t r u e s i g n a 1 e x i s t s i n 

a d d i t ion t o b a c: k gro u n d » 

W h e n a s t a t e e n t i s m a d e a b o u t a s t a t i s t i c: a 1 s i t u a t i a n 9 i t 
should be clear under which hypothesis this statement holds. 


The first of the two questions may be answered by giving the 
p r o b a b i 1 i t y f o r a c h a n c e o c c u r a n c 0 a f t he abser v 0 d e x c: e s s b y 
a s t a t i s t i e a 1 f 1 u c t u a t i o n ( u. n d 0 r M 0 ) « 1 1 i s a f c o u r s e 

n e c e s s a r y t o u s 0 t h 0 p r a p e r s t a t i s t i c ( 0 » g « b i n o m i a 1 
statistic for small numbers of events),, If a low probability 
f o r t h e c h a n c e o c c: u r a n c a o f ' e x c e s s e v 0 n t s ' i. s f o u n d , i t i s 
t h e n u s u ally c o n c 1 u d e d t h a t t h e p r 0 s e n c: e o f a ' p h y s i c a 3. 
s i g n a 1 ' i s 1 i k e 3. y „ F r o m t h 0 r 0 o n h y p o t h e s i s o n e :i s a d v a c: a t e d 
and all statements made should refer to Ml,, 

Only under Ml the term significance should be used » In 

p a r t :i. c u 1 a r t h e o f 1 0 n u s e d f o r m u 1 a < G N OC OFF)/ <X if OFF" 1 is 

useless (as are a number of other formulae, see e.g. (4) ) « 
Also the probabi l ity which answers the first question should 
not be converted into a significance (as is sometimes done 
b y u s i n g t h e i n t e g r a t e d G a u s s i a n d i s t r i b u t i o n , e v e n i n c: a s e s 
w h e r e t h e B a u s s i a n s t a t i s t i c d o e s n o t: a p p 1 y > « 

In answering the second question then the presence of a 
signal is assumed (Hi) . The 'significance of the signal' k 
can be defined as the ratio of the best, estimate of the 
s i g n a 1 t o i t s un c er t a i n t y « I n t h e e ase of Po i sson i an 
c oun t i r‘i g st at i s t i c f or- wh i c h t h e var i an c e i s e qua 1 t o t h e 
mean a straightf orward error propagation leads to the well 
k n own f or mu 1 a ( i n t er ms of t h e ab o ve d ef i n ed var i ab 1 es ) is 


ON 0 ( OFF 

' s i g n i f i c a n c:: e ' k ■ j [ 1 3 

Yon + oC OFF 

i n u n i t s o f s t a n c J a r d d 0 v i a t i a n s cr < s e e R e f ,, ( 3 , 5 , 6 i 7 ) ; n o t e 
that in ( 3 ) this formula is interpreted incorrectly) « 
Formula til may be also derived by using the more 

com p 1 i c a t e d m a x i m u m 1 i k 0 1 y h a o d r a t i o (6) « 
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A general criticism of the work of (3) and to some extent of 
(1) and (2) is given in (6). While it is very important, not 
to overestimate statistical significance. Ref. (3) does too 
much, leading to an underestimate. 

More recently, (.5) has contributed significantly to the 
confusion by trying to show that formula Cl .1 is incorrect 
and should be replaced by another complicated formula. The 
main argument is that the new formula fits much better to 
Monte Carlo simulations than formula Ml does. The whole 
discussion is misleading and suffers from the fact that no 
distinction between HI and HO is rnades while formula M3 
refers to HI the Monte Carlo simulations as well as the new 
formula refer to HO, so their distributions are necessarily 
d i f f er en t .. 

For the example given in Fig. 1 (with a unit of 1 for the 
scale of counts x . ) the two questions can be answered as 
f o .1. 1 ows s 

1. The probability (under HO) for a chance occurance of 14 
events in bins 1 and 2 whith an average rate of 6 in two 
bins is ,£.10-3, using binomial statistic (note that 
Poi ssorii an statistic gives the somewhat larger probability 
o f 3.6 x 10—3) .. 

2. If one feels that the probability of 10-3 is low enough 
to postulate the existence of a physical signal (HI), then 
t. h6 s i g n i f i c an c e of t h is s i g n a 1 is 

14 - (2/10) 16 

k = 3 ZZZ - i - 2.6 standard deviations. 

/ 14 + (2/10)* 16 

To put it in other words again we consider Figure 2« 



Fi gure 2„ 

R & p r e s e n t a t i a n 
ot event number 
d i s t r i b u t i o n s „ 
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Tf the signal. ON - 0( OFF is com pared to the standard 

deviation o-f the background 0(7 pFF^ one gets an estimate -for 
t h e c h a n c e o e c u r a n c e u nder t. h e n u 1 3. h ypothesi s H 0 - 
If* on the other hand, ON - OC OFF is compared to both the 
standard deviations o-f the background and the signal, as is 
done by -formula [1] under Hi, one gets a different estimate- 
This is related to the probability that a second measurement 
(ixnd^r identical conditions) will lead to a null result (ON 
^ 0( OFF)., It is this estimate that should be called 

' is i g n i f i c a n c e o f t h e cl e t e c t e d s i g n a 1 ' « 


3- Final remarks 

V a 3. u e o f s i g n :i. f i c a n c e s i n la n its of standard deviations are 
usually quoted when the detection of some signal is claimed. 
C o n s e q u a n 1 3. y v a f o r m u 3. a ref 0 r i n g t o H 3. (existence of a 
s i g n a 1 ) s h o u. 3. d b e la s e d „ 

Formula Cl] has been widely adopted by X-ray astronomers 
a n d h a s a s s la c h s e r v e d s la c: c e s s f la 1 3. y as a standard allowing 
the reliable intercomparison of stated values of 
s i g n i f i c an c e 1 1 i s up t o t h e i n d i v i d ua 1 f ram what 1 eve 1 of 
3 i g n i f i c a n c e o n w a r d o n e s tart s t o / b e 1 i e ve ' i n s ome repor t e ci 
r esu 1 1. « Our p er son a 1 v i e w 1 s that us i n g f or- mu 1 a £ 1] a 
minimum significance of 3 standard deviations (better yet 5) 
s h a u 1 ci b e r e a c h e d . 
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